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Abstract 

In  this  paper,  we  introduce  a  simple  new  set  of  techniques  for 
deriving  symmetric  and  positive  definite  secant  updates.  We  use  these 
techniques  to  present  a  simple  new  derivation  of  the  BFGS  update  using 
neither  matrix  inverses  nor  weighting  matrices.  A  related  derivation 
is  shown  to  generate  a  large  class  of  symmetric  rank-two  update 
formulas,  together  with  the  condition  for  each  to  preserve  positive 
definiteness.  We  apply  our  techniques  to  generate  a  new  projected 
BFGS  update,  and  indicate  applications  to  the  efficient  implementation 
of  secant  algorithms  via  the  Cholesky  factorization. 


1.  Introduction  and  Background 


In  1965,  Broyden  [2]  published  two  apparently  equally  reason¬ 
able  methods  for  generating  Jacobian  approximations  J+  e  R11*11  in  a 
quasi-Newton  method  for  solving  F(x)  =  0  whose  basic  step  is 

x+  =  xc"  Jc1  F<xc>’ 

where  F  :  Rn  •>  Rn,  x£  e  Rn,  and  Jc  e  R0*11  is  nonsingular.  The  method 

which  bears  his  name  works  very  well  and  consists  in  taking 

i  _  j  +  (y-J  s)sT  (1.1) 

J+  '  Jc  +  — S - ’ 

s  s 

where  s  =  x+  -  xc  is  the  current  step,  and  y  =  F(x+)-F(xc)  is  the 
yield  of  this  step.  It  is  easy  to  show  [7]  that  J+  is  nearest  Jc 
in  the  Frobenius  norm  j|  •  || p  among  all  matrices  in 
Q(y,s)  ={J  e  Rnxn:Js  =  y}, 
the  generalized  quotients  of  y  by  s. 

Broyden* s  other  method  does  not  work  so  well,  but  it  seems  just 


as  reasonable,  since  it  is  to  choose 

(y-J  s)yTJ„ 


J+  -  0C  * 


(1.2) 


y  J  s 
J  c 


or,  equivalently,  ,  T 

,-l  -  ,-l  ,  (S-°C  ^  ,  (1.3) 

J+  _  J  +  - y - 

y  y 

the  nearest  matrix  in  Q(s,y)  to  J’*  in  the  Frobenius  norm.  These 
methods  have  basically  the  same  good  theoretical  justifications. 
Powell  [17]  and  Greenstadt  [15]  defined  symmetric  analogs  of 


these  methods  for  the  case  when  F  is  the  gradient  of  some  nonlinear 
functional  f:  Rn  ->•  R.  Now  we  are  dealing  with  Hessian  matrices,  which 


we  will  denote  by  Hc,  H+,  and  so  it  seems  desirable  to  have  the 


approximation  H+  inherit  symmetry  from  Hc.  Again  it  seems  as  reason¬ 
able  to  minimize  the  change  from  Q(s,y)  n  {A:A=AT}  to  H”1  as  Greenstadt 
does,  as  to  follow  Powell  and  minimize  the  change  to  Hc  from  candidate 
approximations  in  Q(y,s)  n  {A : A=AT } .  Once  more,  the  theoretical 
justification  is  similar  and  good,  but  numerical  experience  favors 
Powell's  symmetric  form  of  (1.1). 

There  are  various  reasons  why  it  has  been  thought  desirable  to 
maintain  positive  definiteness  as  well  as  symmetry  in  the  sequence 
of  approximate  Hessians  and  this  is  done,  when  possible,  by  the  DFP 
([4],  [10])  update  formula 

(y-H„s)yT  +  y(y-H„s)T  sT(y-H„s)y  yT  ^  ^ 


H+  =  V 


or  h;1  -  H"1 


(y's  y 


H_1y  yTH_1  T 

c  J  J  c  .  s  s  , 

___  +  -y— 

y  Hciy  y's 


and  also  by  the  BFGS  ([3],  [9],  [13],  [19])  formula 

1  1  +  sCs-H^y)1  yT(s-H"1y)s  sT 

li*  ~  u*"A  _i_ _ S* _  S*  _  _  _  . 


H+  -  Hc 


H  s  sTH  T 

H  =  H  -  - £■  + 

+  C  sTHs  y 1 s 


(sTy)‘ 


(1.5) 


Since  sTy  =  sTH  s  for  any  H  e  Q(y,s),  it  is  obvious  that  a 
necessary  condition  for  Q(y,s)  to  contain  a  positive  definite  matrix 
is  yTs  >  0.  It  is  well-known  that  if  Hc  is  symmetric  and  positive 
definite, then  yTs  >  0  is  sufficient  to  ensure  that  both  (1.4)  and 
(1.5)  generate  H+  that  inherit  both  properties.  We  will  give  a  very 
simple  short  proof  of  this  fact  in  Section  2. 


Dennis  and  More  [7]  and  Dennis  and  Schnabel  [8]  show  that  (1.4) 
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and(1.5)are  again  least  change  updates.  In  this  case,  (1.4)  defines 
the  minimum  change  to  Hc  to  obtain  H+  e  Q(y,s)  n  {A:A=A^"}.  The  change 
is  measured  by  ||  W(Hc-H+)W]|p  where  W  is  any  nonsingular  matrix  for 
which  WTW  =  M  e  Q(s,y).  Update  (1.5)  defines  the  least  change  to 
Hc*  from  Q(s,y)  n  {A:A=a"'"}  measured  by  ]|W  (Hc*- H+'  )W  *||p.  In  this 

case,  unlike  the  others,  computational  experience  indicates  that  the 
BFGS,  which  makes  the  least  weighted  change  to  the  inverse  of  Hc,  out¬ 
performs  the  DFP,  which  makes  the  least  weighted  change  to  Hc> 

These  derivations  are  unsatisfying  because  they  relate  the  good 
Broyden  (1.1)  to  the  less  successful  DFP  (1.4)  ad  the  bad  Broyden  (1.3) 
to  the  more  successful  BFGS  (1.5).  In  Section  2,  e  will  give  a  new 
derivation  of  the  BFGS  directly  from  the  good  Broyden.  This  new  deriva¬ 
tion  is  invariably  successful  in  the  classroom.  We  also  show  how  the 
DFP  is  derived  from  the  bad  Broyden.  In  Section  3,  we  show  how  the  new 
derivation  can  be  used  to  derive  from  the  rank-one  methods  a  larae  class 
of  the  symmetric  rank-two  secant  updates  that  inherit  positive  defi¬ 
niteness.  We  also  use  this  same  technique  to  obtain  a  relationship  between 
Oren's  [16]  sizing  of  the  Hessian  and  hereditary  positive  definiteness. 

It  enables  us  to  coerce  Powell's  symmetric  Broyden  formula,  and  all 
the  other  rank  two  updates  we  derive,  into  having  this  desirable 
property. 

Section  4  is  devoted  to  applying  our  technique  to  the  derivation 
from  projected  rank-one  updates  of  the  projected  rank-two  updates 
of  the  type  introduced  by  Davidon  [5],  In  particular,  we  derive  a  new 
projected  BFGS  update  from  the  projected  Broyden  update  of  Gay  and 
Schnabel  [11].  In  Section  5,  we  relate  our  derivations  to  an  algorithm 
of  Goldfarb  [14]  for  updating  a  Cholesky  factorization  of  Hc. 


» 
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We  hope  that  specialists  will  find  the  entire  paper  of  interest, 
but  we  believe  that  Sections  2  and  5  should  be  of  interest  to  anyone 
who  teaches  this  material,  since  they  constitute  a  quick  and  simple 
way  to  derive  the  BFGS  update  from  the  B royden  update  in  a  form  that 
leads  directly  to  its  Cholesky  factorization  Implementation  via  the 
update  of  the  LQ  factorization.  These  methods  are  all  the  material 
on  updates  that  really  needs  to  be  taught  in  a  general  numerical 
analysis  course. 
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2.  The  BFGS  and  DFP  from  the  Good  and  Bad  Broyden  Methods 

In  this  section,  we  will  need  the  following  very  simple  lemma 
characterizing  when  a  symmetric  positive  definite  matrix  exists  in 
Q(y,s)  for  y,s  e  Rn.  This  lemma  is  quite  easy,  and  it  will  form  the 
basis  for  our  subsequent  derivations. 

Lemma  2.1:  Let  y,  s  e  Rn,  s  nonzero,  and  let  Q(y,s)  =  {AeRnxn:  As=y). 
Then  Q(y,s)  contains  a  symmetric  positive  definite  matrix  if  and  only  if, 
for  some  nonzero  v  e  Rn  and  nonsingular  J  e  Rn*n,  y  =  Ov  and  v  =  JTs. 
Proof:  If  v  and  J  exist  then  clearly  y  =  Jv  =  JJTs  and  JJT  is  the 
symmetric  positive  definite  matrix  we  seek. 

Now  suppose  A  is  a  symmetric  positive  definite  matrix  with 
y  =  As.  Let  A  =  LLT  be  the  Cholesky  factorization  of  A  and  set  J  =  L 
and  v  =  LTs  to  complete  the  proof. 

If  we  have  a  symmetric  positive  definite  approximate  Hessian  Hc 
and  we  want  to  obtain  H+)  which  inherits  these  properties  as  well  as 
the  property  of  incorporating  the  new  problem  information  by  being  in 
Q(y,s),  then  the  preceding  lemma  guides  us  to  a  solution.  We  probably 
have  a  Cholesky  factorization  of  Hc  =  L^,  and  we  know  from  the  previous 
lemma  that  the  sort  of  H+  we  desire  exists  if  and  only  if  we  can  find 
a  v  and  J+  such  that  y  =  J+v  and  v  =  ojs.  It  seems  quite  natural  to 
think  of  trying  to  obtain  J+  from  Lc>  and  in  fact,  we  would  hope  to  do 
this  without  making  a  larger  change  to  Lc  than  necessary, in  order  to 
preserve  as  much  as  possible  of  the  information  stored  in  Lc  which  has 
been  gathered  as  the  iteration  has  proceeded.  This  motivates  choosing 
J+  by  the  following  procedure. 
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BFGS  Procedure 

1.  Assuming  we  know  v  e  Rn,  find  the  J+  e  Rnxn  which  is  nearest  Lc  in 
the  Frobenius  norm  and  satisfies  J+v  =  y. 

2.  Solve  for  v  so  that  jjs  =  v. 

The  proof  of  the  following  theorem  shows  that  the  solution  is  the  BFGS  update. 


Theorem  2.2:  Let  Lc  e  Rn*n  be  nonsingular,  Hc  =  Lcl£,  y,  s  e  Rn, 
s  nonzero.  There  is  a  symmetric  positive  definite  matrix  H+  e  Q(y,s) 
if  and  only  if  yTs  >  0.  If  there  is  such  a  matrix,  then  the  BFGS  update 


(2.1) 


and  either  the  positive  or  negative  square  root  may  be  taken. 


Proof;  Recall  first  from  Lemma  2.1  that  a  necessary  condition  for  the 
update  to  exist  is  that  there  exist  nonzero  v  e  Rn,  and  nonsingular 
J+  e  Rnxn  such  that  J+v  =  y  and  jjs  =  v.  Therefore 

vTv  =  (jJs)T(j"1y)  =  sTy 

which  shows  that  sTy  >  0  is  necessary. 


Now  we  derive  the  BFGS  update  via  the  above  procedure.  If  we  knew 

v,  then  the  nearest  matrix  to  Lc  that  sends  v  to  y  is  just  the 

Broyden  update  (1.1):  in  this  setting, 

(y-L  v)vT 
,1  =i  +  c 
+  c  j 
v  v 


Notice  that  this  reduces  the  problem  of  determining  n  elements  of 
J+  to  finding  the  n  components  of  v.  Now  we  use  the  condition  that 

fyTs-vTLTs) 

v  -  Jjs  =  L'  s  + - T - —  v. 


V  V 


'***■■  - 
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This  implies  that  v  =  a  l_cs  for  some  scalar  a,  and  so  the  problem 
of  determining  the  n  components  of  v  is  reduced  to  finding  the  scalar  a. 
Plugging  back  in,  we  see  that 

(yTs  -  a  sTH_s) 

a  =  1  +  _ _ _ .a 

a2S^H  S 
C 


or 

2  T  T 
a  =  ys/sH(s. 

Therefore  if  y^s  >  0,we  have  defined  a  symmetric  and  positive  update 
in  Q(y.s) . 

We  have  now  proved  everything  except  the  easily  verified 
statement  that  H+  defined  by  (1.5)  is  identical  to  J+jj  where 
J+  is  given  by  (2.1),  no  matter  which  sign  is  taken  for  the  square  root. 


This  derivation  has  the  satisfying  property  of  connecting  the 
good  Broyden  formula  (1.1)  and  the  BFGS  method.  Another  alternative 
in  using  Lemma  2.1  to  derive  a  symmetric  and  positive  definite 
H+  e  Q(y,s)  would  be  to  first  chose  J+  to  satisfy 

jjs  =  v  (2.2) 

and  then  solve  for  v  so  that 

J+  v  =  y. 

The  proof  of  Theorem  2.3  shows  that  if  we  do  this,  and  choose  J+  in  (2.2) 
to  be  the  bad  Broyden  update  (1.2)  to  L^,  the  solution  is  the  DFP  update. 


Theorem  2.3:  Let  Lc,  Hc,  s,  and  y  satisfy  the  hypotheses  of  Theorem  2.2. 
There  is  a  symmetric  positive  definite  matrix  H+  e  Q(y,s)  if  and  only  if 
yTs  >  0.  If  there  is  such  a  matrix,  then  the  DFP  uDdate  H+  =  J+jJ  is  one 
such,  where  ,  _ _  v 


4  =  4* 


Mr  &  -  444 

y  HcAy  c  c/ 


for  either  sign  of  the  square  root. 

Proof:  Let  us  return  to  the  derivational  proof  of  Theorem  2.2.  If  we 
decide,  given  the  intermediate  vector  v,  that  we  will  obtain  from 
(1.2)  via 


T  ,  T  .  K»)*Tl1 


j;  =  L- 


7T7 

C 


to  satisfy  (2.2),  then  the  equation  for  y  =  j+v  is 

T  T 

v  v-s  L  v 

y  =  J  v  =  L  v+L  v  ( - rT - ^ - ) 

vTlJ  s 

so  v  =  b  L^y  for  some  scalar  8  and  plugging  back  in. 


75?y 


(2.3) 


y'Hc> 


L^y. 


(2.4) 


Again  if  y  s  >  0,we  have  derived  a  symmetric  and  positive  definite 
update  in  Q(y,s).  It  is  easily  verified  that  if  J+  is  defined  by 
(2.3)  and  (2.4),  then  J+jJ  is  the  DFP  update  given  by  (1.4). 


•«  . 
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3.  Hereditary  Positive  Definiteness  and  Oren  Sizing  For  Symmetric 
Rank-Two  Updates 

In  the  last  section,  we  followed  two  different  tacks  in  our 
derivations.  Assuming  that  we  had  v,  for  the  BFGS  we  updated  Lc  to  J+, 
and  for  the  DFP,  lI  to  jT.  Then  in  each  case,  we  obtained  v  from  a 
requirement  on  the  transpose  of  the  updated  factor.  In  this  section, 
we  will  generalize  our  derivations  to  include  scaling  matrices.  The 
BFGS  derivation  turns  out  to  be  largely  invariant  to  scaling.  On  the 
other  hand,  the  generalization  of  the  DFP  derivation  turns  out  to  yield 
a  large  class  of  symmetric  rank  two  update  formulas,  including  the 
PSB  in  the  unweighted  case,as  well  as  the  condition  for  each  to  inherit 
positive  definiteness  from  Hc. 

Our  second  interest  in  this  section  is  the  relationship  between 

Oren's  [16]  sizing  and  hereditary  positive  definiteness  of  symmetric 

rank-two  updates.  Oren's  sizing  consists  of  first  multiplying  Hc  by 
2  2 

a  constant  a  and  then  updating  a  Hc  to  H+.  Our  generalization  of  the 

2 

DFP  derivation  will  lead  naturally  to  a  range  of  sizing  factors  o 
which  make  the  PSB  update  of  a  sized  positive  definite  matrix  be 
positive  definite.  A  similar  result  holds  for  any  update  obtained 
via  the  DFP  derivation. 

Let  us  consider  first  the  "BFGS  procedure"  from  the  last  section, 
but  with  scaling  matrices.  We  want  H+  =  J+J+T  and  we  assume  we  have 
H„  =  L_L,J.  Given  nonsingular  W.  and  WD  in  Rnxn,  we  consider  the 

C  C  C  L  K 

procedure 

1.  Assuming  we  know  v  e  Rn,  choose  J+  to  solve 

min  ||  W,(J  -L  )WRj|F 

J+  e  Q(y.v)  L  c  R  i- 

2.  Solve  for  v  so  that  jjs  =  v. 


(3.1) 
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The  BFGS  update  came  from  this  procedure  with  =  WR  =  I.  Note 
that  if  we  are  approximating  the  Hessian,  corresponds  to  a  linear 
transformation  of  the  variable  space  by  W’T,  but  WR  has  no  natural 
interpretation. 


It  is  well-known  ([8],  Corr.  2.3)  that,  for  M  =  WRTWR* 


J ,  =  L  + 
+  c 


(y-L  v)(Mv) 


(3.2) 


solves  (3.1)  independent  of  W, .  Thus,  we  can  say  that  the  BFGS  update 
results  from  the  above  procedure  with  any  and  M  =  I.  Furthermore, 

WR  can  be  any  unitary  matrix  without  changing  the  result.  It  actually 
turns  out  that  the  BFGS  results  from  any  W^  and  any  WR  for  which 
L^s  is  an  eigenvector  of  M.  We  postpone  this  and  the  development 
for  general  WR  to  the  appendix  since  we  can  think  of  no  reason  to 
choose  any  WR  or  M  other  than  I . 

T 

There  would  have  been  good  choices  of  W^,  e.g.,  (W^)  e  Q(s,y), 
since  this  corresponds  to  scaling  y  =  WLy  and  s  =  W^Ts  so  that 
s  =  W^Ts  =  WLT(W^WLy)  =  W^y  =  y  and  J+  =  I  is  feasible.  While  our  BFGS 
derivation  was  invariant  under  such  scalings,  the  situation  reverses 
when  we  introduce  scaling  into  the  DFP  derivation. 

The  generalization  of  the  "DFP  procedure"  is  to  select  nonsingular 
matrices  WL  and  WR,  assume  that  we  know  v  =  J^y,  choose  to  solve 


min  |!Wl(JT-lJ)wr||f, 

JT  e  Q(v,s) 

and  then  solve  for  v  from 


(3.3) 


y  =  J+v. 


(3.4) 


Notice  that  in  this  case  the  role  of  the  scaling  matrices  is  reversed; 
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WR  corresponds  to  a  transformation  of  the  variable  space  by  W^1,  while 
W^  has  no  obvious  justification. 

As  before,  we  see  that  for  M  =  (WRWR)~* 


T  T  (v-l!s)(Ms)T 
J '  =  L  +  - - - 

+  C  _  T„ 


(3.5) 


S  Ms 


solves  (3.3)  just  as  (3.2)  solves  (3.1).  Again  the  answer  is  independent 
of  WL,  but  this  time  it  eliminates  the  scale  matrix  that  we  don't  know 
how  to  choose.  We  will  finish  carrying  through  the  second  procedure  for 
general  WR  or  M,  but  first  we  state  the  result. 

Proposition  3.1:  Let  Lc>  H+,  s  and  y  satisfy  the  hypotheses  of 
Theorem  2.2.  The  result  of  the  procedure  outlined  by  (3.3),  (3.4),  and 
(3.5)  is 


h+  -  v;  =  ”c  + 


(y-H  s)(Ms)T+(Ms)(y-H  S)T  sT(y-H  s)MssTl 


STMs 


c 

TTT T 


(s  Ms) 


(3.6) 


where  J+  is  given  by  (3.5)  and 


v  =  L”1 (y+aMs )  ^3’7^ 

for  either  root  «  of 

a^sTMH^Ms+2asTMH~1y+yTH~1y-sTy  =  0.  (3.8) 

If 

(sTMH^y)2  >  (sTMH‘1Ms)(yV1y-sTy),  (3.9) 

then  J+  is  a  real  matrix  and  H+  is  positive  definite. 

Proof:  Again  we  proceed  in  a  derivational  manner  beginning  with  (3.5) 


and  then  (3.4), 

y  =  J+v  =  Lcv  +  (Ms) 


T  T. 
v  v-s  L  v 
_ c 

sTMs 


(3.10) 
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Thus,  for  some  a,  y  +  aMs  =  Lcv.  Direct  substitution  into  (3.10)  shows 
that  (3.4)  is  satisfied  if  and  only  if  u  is  chosen  so  that 

yTs  =  vTv  =  (L"1y+aL'1Ms)T(L"1y+aL"1Ms) 

v*  U  L 

=  yTH“1y+2aSTMhf1y+a2STMH“1Ms. 

This  is  equivalent  to  a  being  a  root  of  (3.8),  which  has  real  roots 
if  and  only  if  (3.9)  holds.  Clearly,  if  v  and  J+  are  defined  by  a 
real  a,  then  H+  is  positive  definite.  It  is  straightforward  to  show 
that  H+  is  real  in  any  case  and  is  given  by  (3.6). 


It  is  shown  in  [18]  that  the  class  of  matrices  (3.6)  is 
equivalent  to  the  set  of  all  symmetric  rank-two  updates  that  can 
be  represented  as  the  difference  of  two  symmetric  rank-one  updates. 

It  should  also  be  noted  that  the  scaling  used  above  corresponds  exactly 
to  the  scaling  used  by  Dennis  and  More  [7]  and  Dennis  and  Schnabel  [8] 
in  their  least  change  derivations  of  the  same  class  of  updates. 

Now  we  give  the  relationship  of  hereditary  positive  definiteness 
to  Oren's  sizing.  The  proof  is  obvious. 


Corollary  3.2:  Let  M  and  Hc  =  L  be  symmetric  positive  definite 
matrices  and  let  s,  y  e  Rn  with  sTy  >0.  If  o  is  any  number  for  which 


2  ( sTMH~1Ms  )yTH~!y  -  (sTriH^y)2 

a  >  - j — ^ j - 

(s  MH(,'LMs)y  s 


(3.11) 


2  T 

then  (3.6)  applied  to  a  H  *  (oL  )(oL')  defines  a  symmetric  positive 

L  C  C 

o 

definite  H+,  (3.9)  is  a  strict  inequality  for  a  H  ,  v  defined  by  (3.7) 
for  oLc  is  real,  and  J+  defined  by  (3.5)  is  a  real  matrix  with 

H+  *  Vi- 
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It  is  interesting  to  note  that  if  a  =  1  satisfies  (3.11),  then 
H+  inherits  positive  definiteness  directly  from  H  ,  but  that 

2  y\‘y 

a  =  — j - ,  one  of  Oren's  recommended  choices,  always  satisfies 

y  s 

(3.11)  and  is  independent  of  M  and  WR. 

We  complete  the  section  by  specializing  Theorem  3.2  to  the  PSB, 
DFP,  and  BFGS  formulas. 

Corollary  3.3:  Let  L  ,  H  ,  s,y  satisfy  the  hypothesis  of  Corollary  3.2 


2  (sTH‘1s)(yV1y)  -  (sJh\)Z  ? 

and  let  a  >  - =? — ; - v - .  Then  the  PSB  update  of  cj  H  , 

(s'H"1s)(yTs)  c 


H+  =  a  Hc  + 


2  (y-02H  s)sT  +  s(y-r2H  s)  ST(y-02H  s)ssT 

“iii  v»  L  L 


sTs 


(sTs)2 


is  a  positive  definite  matrix,  and  H+  =  J+J+T,  where 

T  T  (v-aL^s)sT 
j!  =  oL‘  +  - - ’ 


sTs 


V  =  L^(y+as), 


sTH' 


a  =  -  — 


'cly  ±N/(sTH‘1y)2  -  (sTH‘1s)(yV1y-a2sTy) 


sV1* 

c 


are  all  real. 


Proof:  The  proof  follows  from  the  quadratic  formula  and  the  fact  that 
(3.6)  with  M  =  I  =  WR  i s  the  PSB  update. 


As  we  discussed  earlier,  other  than  the  identity,  the  obvious  scaling 
to  try  is  M  =  (WRwJ)_1  e  Q(y,s).  The  result  is  the  DFP  formula.  The 
following  is  straightforward. 
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Corollary  3.4:  Let  Lc>  Hc>  s,  y  satisfy  the  hypothesis  of  Corollary 
2 

3.2,  and  let  a  be  any  positive  number.  Then  the  DFP  update  H+  of 
2 

a  Hc  is  positive  definite  and 


The  following  corollary  is  not  so  obvious,  but  it  is  perhaps 
the  most  interesting  of  all.  It  consists  in  applying  a  scaling  from 
[18]  to  obtain  the  BF6S  update  from  the  same  derivation  as  the  DFP 
and  PSB. 


Corollary  3.5:  Let  Lc,  Hc,  s,  y  satisfy  the  hypothesis  of  Theorem  3.2. 
Then  for  any 

M  e  Q 

and  any  scalar  y,  (3.6)  defines  the  BFGS  update  H+  of  Hc-  The  BFGS 
2 

update  of  any  a  Hc  is  positive  definite  for  any  real  a. 

Proof:  First  notice  that  (3.6)  is  independent  of  scalar  multiples 
of  M  and  then  plug  and  grind.  Take  (2.1)  with  its  unspecified  sign 
on  the  radical  and  equate  its  transpose  to  (3.5). 


The  interesting  thing  to  note  here  is  that,  by  taking  any  DFP 


scaling  M  c  Q(y,s),  the  BFGS  scaling  is 


which  is  a  convex  combination  of  the  DFP  scaling  and  the  current 
scaling.  In  fact,  if  the  conditions  of  Dennis  and  More  [6]  for 


q-superl inear  convergence  are  met,  it  is  easy  to  show  that  M 
asymptotically  approaches  (M  +  Hc)/2. 


4.  A  Projected  BFGS  from  the  Projected  Broyden  Update 


Davidon  [5]  modified  the  standard  symmetric  rank-two  update 
formulas  in  an  attempt  to  satisfy  the  current  secant  condition  H+s  =  y 
without  doing  more  than  necessary  damage  to  past  secant  conditions. 

We  will  introduce  some  notation  in  order  to  state  the  problem.  Let 
{s^,...,sm)  c  Rn ,  assume  s  is  linearly  independent  of  the  space  spanned 
by  the  s^'s,  and  consider  the  following  problem: 

Given  Hc  =  L^,  s,  y  c  Rn  with  yTs  >  0  find  H+  =  J+j|  such  that 
H+s  =  y  and  H+s..  =  H  s^ ,  i  =  1 ,  1 ,  . . . ,  m.  (4.1) 

The  s.j  can  be  interpreted  as  past  steps  and  s  as  the  current  step. 
Schnabel  [18]  proved  that  a  solution  is  possible  if  and  only  if 

(y-Hcs)TSi  =  0,  i  =  1,  2,  ....  m.  (4.2) 

Gay  and  Schnabel  [11]  gave  a  projected  form  of  Broyden 's  update 
which  satisfies  (4.1)  in  the  case  when  Hc  and  H+  are  not  required  to 
be  symmetric.  In  this  section  we  will  use  a  form  of  Gay  and  Schnabel's 
update  in  place  of  Broyden' s  update  in  the  BFGS  derivation  of  Section  2. 
The  result  will  be  a  new  projected  BFGS  formula  which  agrees  with 
Davidon's  version  for  quadratic  functionals.  Our  formula  will  satisfy 
(4.1)  for  every  s.  that  satisfies  (4.2),  but  it  will  also  have  a  fairly 
sensible  partial  version  of  (4.1)  for  all  the  s^ . 

The  procedure  we  will  follow  to  derive  the  projected  BFGS  update 
is  the  following.  Once  again  we  assume  we  have  Hc  =  IclJ,  and  we 
want  H+  =  J+JT. 

Projected  BFGS  Procedure 

1)  Assuming  we  know  v  t  Rn,  choose  J+  to  solve 

min  II "  Lc II F 

o+  £  Q(y,v) 

subject  to  (J+-Lr)L^s.  =0,  i  =  l,  ....  m  (4.3) 
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2)  Solve  for  v  so  that  oj s  =  v 

This  procedure  is  carried  out  in  the  proof  of  Theorem  4.5.  It  differs 

from  the  "BFGS  procedure"  of  Section  2  only  in  the  addition  of  condition 

(4.3).  In  Lemmas  4.1  -  4.4  we  justify  this  condition.  Essentially, 

Lemmas  4.1  and  4.2  show  that  the  condition  ( J . -L^ ) lT  s =  0  is  half 

+  c  c  i 

of  a  necessary  and  sufficient  condition  for  any  "reasonable"  update 
to  satisfy 

(Vl-L^Si  =  0.  (4.4) 

The  other  half  is  (J+-L(.)^si.  =  0.  Lemma  4.4  shows  that  the  above 
procedure  is  guaranteed  to  produce  an  H+  =  J+J+  which  satisfies  (4.4) 
whenever  this  is  consistent  with  H+s  =  y.  We  will  state  the  following 
lemmas  in  terms  of  matrices  J+  and  Lc  and  vector  s^  for  ease  in 
referring  to  them  later,  but  the  lemmas  will  contain  explicit 
hypotheses  and  no  other  assumptions,  such  as  Lc  being  lower  triangular, 
are  meant  to  be  implied  by  the  notation. 

Lemma  4.1:  Let  Lc,  d+  e  Rnxn,  s,.  e  Rn.  If 


<VLX5i  '  0 

(4.5) 

(J+-Lc)TSj  *  °- 

(4.6) 

<JvVLcL>i  =  °- 

(4.7) 

Proof:  The  proof  follows  from  the  identity: 

Vi  -  X  ■  (J+-lcXJ+-lc)t  *  lc(J+-lc)T  +  <VLX- 


(4.8) 


Lemma  4.2:  Let  the  hypotheses  of  Lemma  4.1  hold,  and  assume  in 
addition  that  Lc  is  nonsingular.  Then  (4.7)  and 

rank  (J+jJ  -  L^J)  >  2  (rank(0+-Lc))  -  1 
implies  that  (4.5)  and  (4.6)  hold. 

Proof:  The  proof  will  consist  in  showing  that  if  (4.7)  holds,  then 
either  (4.5)  and  (4.6)  hold  or  the  hypothesized  rank  condition  does 
not  hold.  First  we  regroup  terms  in  (4.8)  to  obtain 

=  0+(J+-Lc)T  ♦  (J+-Lc)lJ.  (4.9) 

We  see  immediately  that  if  (4.7)  holds,  then  (4.6)  implies  (4.5). 

Now  again  from  (4.8), 

4(j+jI-LcLI)si =  +  2sI(vLc)LJsi 

and  so  if  (4.7)  holds,  then  (4.5)  and  (4.6)  are  equivalent. 

Now  suppose  that  neither  (4.5)  nor  (4.6)  holds.  Since  Lc  is 
nonsingular,  let  k  =  rank(J+-Lc)  =  rank(Lc(J+-Lc)T) .  Again  from  (4.8), 

rank(J+jJ-LcL^)  =  2k-(a+b), 

where 

a  =  dim  [(  row  space  of  J+-Lc)  n  (row  space  of  Lc(J+-Lc)T)] 

b  =  dim  {z  e  Rn  :  (J+jJ-LcL^)z  =  0  and  (0+-L^)Tz  *  0  *  (J+-Lc)L^z}. 

Since  we  are  supposing  (4.7)  but  neither  (4.5)  or  (4.6),  b  £  1.  Now 
we  transpose  (4.9)  and  obtain,  from  (4.7), 

o  =  Lt(vLc)V(vy4' 

Using  this  and  the  fact  that  Lc(J+-Lc)^s1-* 0  because  (4.6)  doesn't 
hold  and  Lc  is  nonsingular,  we  see  that  a  >  1.  Thus,  rank 


I 


(0+jI-LcLj)  s  2k-2. 

The  rank  condition  in  Lemma  4.2  is  required  to  exclude  "unreason¬ 
able  updates"  such  as  J+  =  Q  •  L  ,  Q  orthogonal,  which  satisfy  (4.7) 
without  satisfying  (4.5)  or  (4.6).  In  the  case  when  J+  is  a  rank- 
one  update  to  Lc  we  have  the  following  easy  corollary. 

Corollary  4.3:  LetLc,J+,  si  obey  the  hypotheses  of  Lemma  4.2.  If 
rank  (J+-Lc)  =  1,  and  J+jj  *  ^lJ,  then  ( 4. 7 ) i s  equivalent  to  (4.5) 
and  (4.6). 

Proof:  From  Lemma  4.2,  (4.7)  implies  (4.5)  and  (4.6).  Lemma  4.1  is 
the  converse. 


Now  we  show  that  we  can  expect  the  result  of  the  Projected 
Procedure  to  satisfy  (4.6),  and  hence  (4.7),  for  any  si  for 


which  (4.2)  is  true. 

Lemma  4.4:  Let  Lc  e  Rn*n  be  nonsingular,  J+  e  Rn*n,  s,  s^ ,  y  t  Rn, 
and  let  (4.5)  hold.  Set  Hc  =  LcL^  and  v  =  jjs.  If  (4.2)  holds  for 
si ,  then  (y-Lcv)Tsi  =  0.  If  J+v  =  y,  rank  (J+-Lc)  =  1,  and 
sTy  *  y^H^y  a^so  hold,  then  (4.7)  holds. 

Proof:  First  we  note  that 

(y-Hcs)Ts.  -  (y-Lcv)Ts.  =  (Lcv-Hcs)Ts. 

=  (LcJIs-LcLIs)Tsi 

=  sT(J+-Lc)TLjs., 

and  so  (4.2)  and  (4.5)  imply  (y-Lcv)Ts..  =  0.  If  we  assume  that  J+v  =  y. 


o  =  (y-Lcv)TSi  =  vT(J+-Lc)TSi, 


(4.10) 
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n  T  T 

but  since  rank  (J+-I_c)  =  1,  for  some  w^,  e  R  ,  (J+-L  )  =  W£ 

and  (4.10)  becomes 

0  =  vTwlWJsr 

Thus,  either  (J,-L  )v  =  0  or  (J  -L  )'s.  =  0.  If  0  =  (J  -I.  )v  =  y  -  L  v, 
‘r  C  "f'C‘  +c  c 

then  v  =  L~*y  and  yTs  =  (J+v)Ts  =  vTjjs  =  v^v  =  y^H~*y,  which  contradicts 
the  hypothesis.  This  means  that  (4.6)  must  hold,  and  since  we  have 
assumed  (4.5),  (4.7)  must  hold  by  Corollary  4.3. 


Now  we  derive  the  new  projected  BFGS  update.  We  let  6^.  denote 
the  Kronecker  delta. 

Theorem  4.5:  Let  L  e  Rn*n  be  nonsingular,  H  =  L  LT,  and  let 
c  c  c  c 

(s,y,s  ,...,sm)  c  Rn,  s  linearly  independent  of  the  space  spanned  by 
{Sj,...,smh  Assume  without  loss  of  generality  that 

sJh  s.  =  6...  Define 
l  c  j  U  _  m  _ 

s  =  s  -  z  s.(s  H  s.) 
i=l  1  c  1 


and 


m 


T, 


y  =  y  -  T.  H  s .  (s  H  s.). 
i=l  c  1  c  i 


Set 


for  0^  =  -^-X- 


s’^H  s’ 
c 


(y-aH  s)(aL  s) 

V  L  +  - ~ 

5  y 


and  define  H+  =  j+jj.  Then 


—  H  s  s^H 

H  =  H  +  XJL  -  - £ 

+  c  T—  — T  — 

s  y  s  H  s 


H+s  =  y»  ^+'Lc^csi  "  1*2, ...,m  and  d+  is  real  if  s^.y  >  0. 

If  (y-Hcs)Ts.  =  0  for  any  i  =  l . m,  then  (H+-H  )Sj  =  0. 


(4 


.11) 


(4.12) 


Proof:  The  proof  consists  mainly  of  the  derivation  of  update  (4.12) 
via  the  procedure  outlined  earlier. 

From  Theorem  2.1  of  [11],  the  solution  to  step  1  of  the  projected 


BF6S  procedure  is 


where 


J  -  L  ♦  ly-Lc',)v 

+  c  — r~~ 

V  V 


—  T  T  T 

V=V-IT1  LcSi(V  Lcsi>- 


(4.13) 


(4.14) 


Thus  step  2  of  the  procedure  requires  that 

T  _  (yT.-vTLTs) 

v  =  L‘s  +  v 

V  V 

which,  by  (4.14),  implies 

T  m  t 

v  =  a  LCS  +  iIi  BlLcsr 


(4.15) 


(4.16) 


for  some  scalars  a,  g.,  ....  g  .  Now  from  (4.14)  and  s  H  s.  =  6j.-» 

1  Hi  1  c  J  I J 

we  see  that  7  L^s.  =  0  for  every  i,  so  from  (4.15)  followed  by  (4.16), 


we  have  for  every  i , 


<LIs>TLIsi  -  ^  -  «<LI5>T|-Iv6i' 


g.  =  (l-a)(L^s)TLcS. 
This  allows  us  to  rewrite  (4.16)  as 


v  =  «  [Li*-  I  lVuWs,]  +  £  LIsi(LJs)Tl-Jsi  =  «  r  +  z, 

c  i=i  c  1  c  c  1  i=i  L  1  L  c  1 
where  r  and  z  are  defined  in  the  obvious  way  and  r  =  s  .  Notice  that 
v  =  v-z,  so  v  =  ar  and  we  only  need  find  a  to  have  v  and  hence  v.  Note 


also  that 


rTz  =  vTz/a  =  0, 


since  v^lJs-  =  0  for  all  i. 
c  i 


To  find  a,  direct  substitution  shows  that  as  in  the  proof  of 


Proposition  3.1,  (4.15)  is  satisfied  if  and  only  if 

sTy  =  vTv  =  a2r*r  +  2ar^z  +  z^z 

2  T  .  T 
=  a  r  r  +  z  z. 


Thus 


and 


sTy-zTz  =  a2sTLcL^  s  , 

STV  [(lJs)TlJs.]2  -  a2  ?V  . 

STy  -  E  (STH  S.)2  =  a2  STH  S  , 
1  =  1  C  1  C 

T-  2  -T  - 
s  y  =  a4  s‘hcs  , 


2 

a  = 


_LJL 


_t  —  * 

s  H  s 
c 


Next  we  show  that  (4.13)  reduces  to  (4.11).  Using  v 
T  T  — T  — 

rz=0,rr=s  Hcs,and  the  value  we  have  just  found  for 

_ t  t  t  2  T  T— 

v  v  =  ar  v  =  ar  (ar+z)  =  a  r  r  =  s  y. 

Also,  by  the  definition  of  v,  y,and  z,  and  r  =  LcTs\ 

y-L  v  =  y-aL  r-L  z 
c  c  c 

_  m 


'  y«Hcs  -1£iHcSi(sTHcs)) 


=  y-«Hcs 
and  so  (4.13)  becomes 


(y-aH _s)(aLls) 
J,  -  L.  +  - C- 


s  y 


-T. 


which  is  (4.11).  Notice  that  a  and  J+  are  real  if  y  s  >  0 
(4.12)  is  obtained  by  algebra  from 

To  complete  the  proof,  notice  that  if  0  =  (y-H<;s)Ts1. 
any  s..,  then  (H+-Hc)s^  =  0  from  Lemma  4.4. 


=  ar. 


.  Equation 

holds  for 


A 
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It  is  straightforward  to  confirm  that  (4.12)  agrees  with  Davidon's 
projected  BF6S  formula  when  f  is  a  positive  definite  quadratic  func¬ 
tion,  but  not  necessarily  otherwise.  Schnabel  is  currently  testing 
an  algorithm  using  the  above  projected  BFGS  update;  the  results  will 
be  reported  elsewhere.  Finally,  we  note  that  in  analogy  to  the 
weighted  DFP  derivation  of  Chapter  3,  an  entire  class  of  projected 
rank-two  updates  can  be  derived  using  the  procedure  (3.3)  -  (3.4) 
with  the  condition 

( J+-Lc )Ts -  =  0,  i  =  1,  ... ,  m 

added  to  (3.3). 
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5.  Updating  Cholesky  Factors 


Finally  we  discuss  the  efficient  sequencing  of  Cholesky  factor¬ 
izations  in  algorithms  that  use  the  update  formulas  derived  in  this 
paper.  All  the  algorithms  of  this  section  have  already  been  suggested 
by  Goldfarb  [14]  using  the  Brodlie,  Gourlay,  and  Greenstadt  [1]  factored 
form  of  the  BFGS  and  OFP  updates  and  the  orthogonal  decompositon  update 
ideas  of  Gill,  Golub,  Murray,  and  Saunders  [12].  Our  purpose  is  to 
point  out  that  they  follow  very  naturally  from  the  preceding  derivations. 

We  will  focus  on  the  BFGS  formula  since  the  development  for  the 
others  is  similar.  We  assume  we  have  l_c,  the  lower  triangular  Cholesky 
factor  of  the  current  Hessian  approximation,  and  that 


(5.1) 


from  (2.1).  Now  we  want  the  Cholesky  factorization  L+l|  of  H+  =  J+jJ. 
However,  (5.1)  is  an  especially  handy  form  for  the  algorithms  of 
[12]  in  which  we  are  given 


or 


0 .  =  L  0  +  wz 
+  c  c 


Jx  =  L  D  V  +  wz1 
+  c  c  c 


and  find 

\  ■  '■A 


or 


J+  =  LAV 


(5.2) 


(5.3) 


2 

respectively,  in  a  small  multiple  of  n  operations.  (Here  Q  and  V 

denote  matrices  with  orthogonal  columns  and  D  a  diagonal  matrix.) 

2 

Equation  (5.1)  is  handy  because  since  Q  =  V  =1,  the  n  work  ordinarily 

c  c 

necessary  to  obtain  Q^z  or  V^z  as  a  first  step  to  obtaining  L+  is  not 
needed. 

It  is  also  unnecessary  to  accumulate  Q+  or  V+.  From  (5.2) 
or  (5.3) 

Ht  =  J+Jj  =  L+Q+qXl3  -  L+lJ 

or 

H  =  J  JT  =  L  D  V  VTD  LT  =  L  D2LT 

and  so  we  have  a  cheap  stable  computation  for  the  Cholesky  or  LDlJ 
factorization  of  H+  from  the  corresponding  factorization  of  Hc. 
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Appendix:  The  scaled  BF6S  derivation 


If  we  carry  through  the  first  derivation  of  Section  2  with  scaling 
matrices, then  we  consider: 

1.  Assuming  we  know  v,  choose  to  solve 


min 


J+  e  Q(y.v) 


!Iwl(j+‘lc)wrI 


2.  Solve  for  v  so  that  J+s  =  v. 


The  solution  is  independent  of  WL  and  depends  on  WR  through 
M  =  W^VJ"1 .  As  noted  in  Section  3,  step  1  gives 


J+  =  Lc  + 


(y-Lcv)(Mv) 


v^Mv 


and  step  2  gives 


v  =  jjs  =  L^s  +  Mv 


I  yTs-vTL^ 
v^Mv 


(A.l) 

(A. 2) 


From  (A. 2), 


Mv  =  a(v-L^s) 


(A. 3) 


for  some  scalar  ot,  and  substituting  this  into  (A. 2) 

t  t  /yTs-vTLJs' 

v  =  L^s  +  (v-Ljs)  1  c 


v"*"v-v^lJs  I 


which  is  satisfied  if  and  only  if 
vTv  =  yTs  *  vTl_Js. 


(A. 4) 


Substituting  (A. 3)  into  (A.l)  , 


■  Lc  * 


(y-Lcv)(v-L^s) 

(v-lJs)Tv 


and  so  using  (A. 4)  and  doing  some  rearranging  of  terms,  we  find 


that  the  solution  to  our  procedure  is 

<c 


T  (y-Hrs)wT  +  w(y-nrs)  (y-nrs)Ts  wwT 

H+-V;-Hc+  • 


wTs 


(WTS)2 


(A.  5) 


where 

A 

w  =  y  -  !_cv  (A. 6) 

and  v  satisfies 

v  =  (I-0/a)M)_1  LJcs 

for  some  scalar  a  such  that 
vTv  =  yTs  . 

If  L^s  is  an  eigenvector  of  M,  we  have  that 


and  the  solution  is  again  the  BF6S  update.  The  reader  can  also 
verify  that  if 

M  =  6  [I  + 

for  any  positive  definite  H  e  Q(y,s)  and  any  positive  scalar  3,  then  M 
is  positive  definite  and  the  DFP  update  results  from  (A. 5-8)  In  fact, 
if  M  is  any  matrix  of  the  form 

M  =  6ll  +  62lJh  '1Lc, 

where  H  is  defined  as  above,  and  e2  are  positive  scalars,  then  M 
is  positive  definite  and  an  update  from  the  Broyden  class  results. 

In  general,  if  y^s  >  y^H~*y,  it  can  be  seen  from  (A. 6-8)  that 
w  can  have  any  direction,  and  we  have  the  same  class  of  updates  as 
we  derived  with  the  DFP  derivation  with  scaling  matrices.  If 

y"^s  <  yTH~*y,  we  have  a  subset  of  this  class. 


(A. 7) 

(A. 8) 


